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ABSTRACT 

Surveys for exoplanetary transits are usually limited not by photon noise but rather 
by the amount of red noise in their data. In particular, although the CoRoT space- 
based survey data are being carefully scrutinized, significant new sources of systematic 
noises are still being discovered. Recently, a magnitude-dependant systematic effect 
was discovered in the CoRoT data by Mazeh & Guterman et al. and a phenomeno- 
logical correction was proposed. Here we tie the observed effect a particular type of 
effect, and in the process generalize the popular Sysrem algorithm to include exter- 
nal parameters in a simultaneous solution with the unknown effects. We show that a 
post-processing scheme based on this algorithm performs well and indeed allows for 
the detection of new transit-like signals that were not previously detected. 
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1 INTRODUCTION 

The limiting factor for most planetary transit surveys is 
not the theoretical photon noise but rather the practically- 
achieved red noise from non-astrophysical sources (Pont et 
al. 2006). The most capable transit surveys are the space- 
based surveys CoRoT □ and Kepler because their stable 
environments allow for minimal red noise (and other bene- 
fits - such as continuous observations). Still, no instrument 
is perfect and the CoRoT light curves are known to show 
a number of significant effects that hinder transit detection, 
among them: discontinuities of arbitrary magnitude due to 
high energetic protons flux near the South Atlantic Anomaly 
(SAA), residuals at the CoRoT orbital period, spacecraft jit- 
ter, CCD long-term aging and more. Many of these effects 
are correctable to satisfactory level at post-processing, but 
not for all stars and at all times. 

On top of these effects, Mazeh & Guterman et al. (2009) 
(hereafter MG09) recently discovered that there are signifi- 
cant magnitude-dependant systematic effects in the CoRoT 
light curves, and they developed a phenomenological algo- 
rithm to correct for them. In this paper we tie the above 
effect to a particular type of effect: added/subtracted linear 
flux, and are thus able to improve on their correction. In 
the process we generalize the Sysrem algorithm (Tamuz et 
al. 2005) to include arbitrary external parameters and show 
the benefits of using this modified version. Below, we for- 
mulate the Sysrem generalization in which is part of a 
complete post-processing scheme presented in Sj3j and con- 
clude. 



2 THE SARS CORE 
2.1 Algorithm 

MG09 first noted that there are magnitude-dependent sys- 
tematic effects in the CoRoT data. They proposed to cor- 
rect for the effects by fitting a parabola to the residuals of 
each exposure — but this correction is a purely phenomeno- 
logical correction since there is no identified cause for the 
effects, and thus no explanation as to why a parabola is 
the best functional form. We hypothesize that the under- 
lying physical mechanism MG09 were trying to correct for 
is a constant flux that is either added or subtracted from 
all the light curves due to calibration errors, scattered light, 
or other causes. Such an additive effect will create a large 
magnitude difference on faint stars, and small magnitude 
difference on bright stars, as MG09 had originally observed. 
Indeed, the original authors had also considered this op- 
tion (Tsevi Mazeh, personal comm.) but they chose to use 
a more phenomenological correction rather then to tie the 
correction to this proposed physical mechanism. Since de- 
trending algorithms can't a priori disentangle additive from 
relative effects, we choose to simultaneously correct both 
types of effects, and so developed "Simultaneous Additive 



1 The CoRoT space mission, launched on December 27th 2006, 
has been developed and is operated by CNES, with the con- 
tribution of Austria, Belgium, Brazil, ESA, Germany, and 
Spain. CoRoT data become publicly available one year after 
release to the Co-Is of the mission from the CoRoT archive: 
http : //idoc-corot . ias . u-psud . f r/ . 



and Relative Sysrem" - or the SARS algorithm - described 
below. 

Suppose a matrix of photometric measurements of N 
stars (i = 1...N) on Al measurements (j = 1...M) is given, 
so that the magnitude value of the i th star on the j frame 
is rriij and its associated error is (Ji,j. After removing from 
each stellar light curve (hereafter LC) its mean or median 
rhi we are left with the matrix of residuals ry. In the orig- 
inal Sysrem, the residuals of intrinsically-constant stars are 
modeled using two contributions: 



Aj d + noise 



(1) 



Where Aj is the effect in each exposure and d is effect's 
coefficient for each star. We note that since the data is in 
the magnitude system the effects found in this manner are 
relative in flux. In order to test out hypothesis that the 
magnitude-dependent effects stem from something that is 
additive in flux, we introduce the SARS model: 



AjXijCA.i + RjCra + noise 



(2) 



Here the second term is exactly the usual Sysrem effect 
were we simply change the letter from Aj to Rj to designate 
that it is a relative effect. The first term stands for the ad- 
ditive effect by introducing Xij = 10 0,4m 'J which makes sure 
that the effect is stronger for faint stars and weaker for bright 
stars - and in the correct functional form expected from ad- 
ditive flux effects. In practice, we use Xij = 10 0,4(mi J ~ m ^i) 
where m re i is a constant number (e.g., the median of all the 
stars on all the exposures) to avoid overly -large or -small 
values for Xij. As in Sysrem, minimizing the sum of squared 
residuals S 



r ij — model \ 

(Tij J 



(3) 



Gives the the besf-fitting effects R and A, and the cor- 
responding coefficients Cr and Ca, which are: 
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As in Sysrem, the values of Aj, Rj, Ca,*, CR,i are iter- 
atively refined until convergence is achieved. We note that 
we found that it is important that in each iteration the new 
values of the effects are used to calculate the coefficients, 
and not the values of the previous iteration. 

Further generalization: The formulae [4] to [7] do not 
"know" that X ij IS meant to scale magnitude data to cre- 
ate flux-based correction - they only know that Xij depends 
on external information not present in the orignial matrix of 
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residuals. In this, SARS present a significant departure from 
the original Sysrem by allowing for the detrending against 
any explicitly known external parameters as long as their ef- 
fect can be encapsulated in some x%j . For example, these can 
be: distance from the center of the CCD or pixel phase (or 
otherwise location-based), CCD temperature (or otherwise 
weather related), or Moon phase (or otherwise temporal ef- 
fects), etc. It is thus easy to include multiple external effects 
in the detrending model (e.g. the SARS model of eq. [2|, and 
by minimizing the sum of squared residuals S to establish 
their effect on the data simultaneously with the effects of 
unknown sources. 



2.2 Suggested Good Practices 

Below we describe what we think are good practices when 
using SARS: 

• Starting point: We note that already the Sysrem 
"search space" was very large: as many parameters to ad- 
just as there are stars + exposures. By simultaneously fit- 
ting more than one effect in SARS - we further enlarge 
this search space greatly. In the original Sysrem the start- 
ing point was deemed to be unimportant since Tamuz et al. 
(2005) claimed that in their simulations no matter what ini- 
tial values were used, the same effect and coefficients were 
obtained. We believe that these simulations were somewhat 
lacking in that they used white noise only, with no red noise 
component, which allowed them to always find the (unique) 
global \ 2 minimum with no local minima to be avoided. We 
have also performed a similar test - but on real data, rather 
than simulated data, and found that sometimes (a few per- 
cent of the runs) the global minimum was indeed missed. 
Fearing that the enlarged SARS search space would worsen 
the problem, we choose to start the iterations from a de- 
terministic point. Assuming that the median of photometric 
measurements is rather robust, we start by finding a proxy 
to the relative and additive effects by: Rj =median(r ij) and 
Aj=medieai(rijXij) respectively. We set all Cr and Ca to 
unity since we wish to find effects that affect many (if not 
all) stars. 

• convergence criteria: The convergence criteria for 
the above iterations was unspecified in Tamuz et al. (2005). 
We define it as the iteration when the maximal absolute 
value of total correction abs(AjXijCA,i + RjCr^) is smaller 
then some fraction / of the standard deviation of that par- 
ticular object. We used / = 0.5 in our processing. 

• Once either additive or relative effects show no further 
correlation, one can use the regular Sysrem to look for addi- 
tional effects of the other type since one may have different 
number of relative and additive effects — until no effects of 
either type are identified. 

• Bright stars both make planetary transits easier to de- 
tect, and are more susceptible to relative effects (that are 
later corrected by Sysrem/TFA (Kovacs et al. 2005) /other). 
For these reasons some of the transit surveys intentionally 
monitor only the relatively bright stars in their field of view. 
On the other hand, fainter stars more readily show additive 
effects. We therefore propose to add more faint stars to the 
data of such surveys when using SARS as they might hold 
the key to better correct all stars. 



3 THE SARS POST-PROCESSING SCHEME 
3.1 Post-Processing Steps 

The above SARS core is just one element of the SARS post- 
processing scheme. We were able to achieve complete au- 
tomation with no human input from CoRoT N2 FITS files 
to cleaned LCs. The post-processing global structure similar 
to the one used by MG09: 

• Resample to 512s: resampling is done for each CoRoT 
color separately, if available. 

• Divide to ~10d blocks, process blocks individually. 

• Subtract a running median with a window the size of 3 
CoRoT orbital periods, and reject outliers 

• Choose a "learning set" to calculate the effects with. 

• Apply the effects to all stars (we used three pairs of 
effects) . 

• Re-set errors and reject bright outliers. 

However, in order to achieve full automation we elabo- 
rate below on steps that were either unspecified or human- 
dependant on MG09: 

• Outliers rejection is done in three tiers: 

(i) Removal of solitary outliers that are far from a small- 
window (5-point) median filter. 

(ii) Further outliers must meet to two criteria: 1) That 
frame has anomalously-high median-absolute-deviation 
(MAD - usually SAA-affected frames), 2) The data point 
is far from a 3-orbit median. 

(iii) Before SARS-core application, frames must have a 
minimum number of valid learning-set stars (we used at 
least 100). 

• Automatic choosing of the learning set aims to isolate 
the intrinsically- and instrumentally- constant stars. These 
stars are assumed to be numerous and similarly-variable in 
the raw data. An initial learning set is chosen by multiple 
criteria of: 

(i) The Alarm statistic of the LC (Tamuz et al. 2006) 
must be part of the bulk of Alarms. 

(ii) The Alarm statistic of the residuals must be part of 
the bulk of Alarms. 

(iii) The locus of constant stars on the log(RMS) vs. Mag- 
nitude plot is along a straight line. Learning-set stars must 
not be far from that locus. 

Next, the learning set is refined by a procedure inspired by 
techniques originally developed for photometric follow up 
of transiting planets (Holman et al. 2006) and is aimed at 
delivering the best comparison signal (lowest relative noise): 

(i) Given a set of N stars, calculate relative error on the 
total flux for all N subsets of (N — 1) stars. 

(ii) Compare the best subset (having the lowest relative 
flux error) with the relative flux error of the sum of N 
stars: 

- If error reduced: repeat from step (i) with (N — 1) stars 

- If error increased: optimal set reached 

This procedure guarantees that a local minimum in relative 
error is reached. We opt not to search for the global optimum 
since this is deemed too difficult (testing all subgroups of 
N stars require testing N\ configurations - were N is in 
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Fraction of learning-set stars in LRc02 Block 1 vs. Magnitude 
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Figure 1. A representetative plot of the fraction of learning-set 
stars in several magnitude bins (data here is from the example 
presented in £|3 ■ 2 1 for the learning set of the first effects-pair in 
the first block of LRc02). 



the order several thousands). We note that the resultant 
learning set preferentally includes faint stars (see Figure [l]) 
which at least partially is because any variability is easier 
to spot on brighter stars. Interestingly, Figure [1] and panel 
2 of Figure [2] show that despite the fact that faint stars are 
preferentially selected as learning-set stars - bright stars are 
better corrected, showing that indeed something was learned 
from the fainter objects and was well applied to the brighter 
stars. 

• We use the SARS core 3 times: 

(i) Use on the learning set only - used to re-calibrate the 
errors only (see below). 

(ii) Use on the learning set only (now with calibrates er- 
rors) - to calculate the effects. 

(iii) Use on the whole data set - apply the already- 
calculated effect to all the LCs. 



Errors re-setting is done by: 

j . ExpErr(j) 
j = Star Err (i) 



median(ExpErr) 



(8) 



Where StarErr is estimated from the star's LC and ExpErr 
is estimated from the distribution of magnitude residuals of 
each exposure. 



3.2 Results 

A comparison of the performance of SARS-cleaned and 
MG09-cleand LCs (the later sometimes dubbed " CleanSet" ) 
of one random field (LRc02) is shown in Figure [5] the SARS 
post processing delivers lower LC dispersion than in MG09's 
CleanSet for about 65% of the stars, while keeping at least 
the same number of valid data points M (CCD E2) if not 
more (CCD El). If we compare the Detection Power, which 
is defined as: DP ~ yM/a, we find that it is higher in 
SARS than in CleanSet for up to 80% of the stars. We 
note that there is a small trend in the relative performance: 
the brighter the stars are - the better SARS is relative to 
CleanSet. This is the expected result of the approximated 
functional form of the MG09 correction: since there are 
many more faint stars in the data than bright stars, the 
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Figure 2. Comparison between SARS and CleanSet for one ran- 
dom field (CCD El of LRc02). Panel A: histogram of the ratio of 
LC dispersion in SARS and CleanSet. Note crsARS / a CleanSet < 
1 for most stars. Panel B: the above ratio of LC dispersion vs. 
R magnitude: the observed trend is explain in the text. Panel C: 
histogram of the number of remaining data points after outlier 
rejection in SARS and CleanSet. Panel D: histogram of the ratio 
of Detection Power (DP) between SARS and CleanSet 



parabolic least-squares correction of MG09 tends to better 
suite the numerous faint ones, and so less fitting (due to the 
approximated functional form) to the bright stars. Thus our 
initial hypothesis that the effects are additive seems even 
more robust. 

This global statistics is also translated to specific detec- 
tions: so far we have SARS-analyzed three long runs (LRc02, 
LRa02, LRaOl) and found about ten new transit-like signals 
that were not detected before in each field. For e.g., in Fig- 
ure [3] we show four such LRa02 new transit candidates that 
were also chosen for follow-up. 
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perable to that of the relative correction abs(RjCn,i), with 
a median additive-to-relative ratio of about 0.5, but with 
a large scatter - making the additive correction larger than 
the relative correction for ~ 1/3 of the data points. Addi- 
tive effects can arise from scattered light or erroneous bias 
or background subtraction, and we believe that the additive 
effects can be used just as the regular Sysrem effects to help 
to trace down the origin of the systematics and thus to avoid 
them in the first place. 

We believe that the main advantage of SARS is not in a 
dramatic change in the standard deviation a of the LCs, but 
rather in the whiter color of their noise, which in turn allows 
for lower background of spurious signals in the BLS spectra 
(Kovacs, Zucker & Mazeh 2002) and thus the detection of 
shallower signals. 

We were able to achieve good performance and com- 
plete automation which allows us to now process the entire 
mission data, and to look for - and find - ever shallower 
transit-like signals. For e.g., on LRc02 target CoRoT ID 
0105842933 we were able to clearly detect a very shallow 
signal, only 10~ 4 magnitudes deep, in a P = 1.08085d pe- 
riod. Not only that, but we were also able to show that this 
is an eclipsing binary since at the double period the odd 
and even eclispes have different depths, with the secondary 
eclipse still visible (on a binned LC) while having a depth 
below the 84ppm depth of an exo-Earth around a Sun-like 
star. 

We will make the SARS-cleaned light curves available 
for the CoRo T community, and it is our intention that when 
the proprietary period is over to make data generally avail- 
able (upon request). We note that we have also allowed for 
the application of SARS to the residuals of astrophysically 
variable stars (pulsators, eclipsing binaries, etc.) which will 
allow to better clean them too - as part of a parallel CEST 
effort to look for transiting circumbinary planets (Ofir 2008, 
Ofir et al. 2009). 



Figure 3. Example of four CoRoT LRa02 candidates that were 
not detected by none of the detection teams before the applica- 
tion of the SARS post processing, and which are currently being 
followed up. The light curves are folded and binned to aid visibil- 
ity. 



4 DISCUSSION 

At the start of the CoRoT mission the CoRoT Exoplan- 
ets Science Team (CEST) made the strategic choice of hav- 
ing multiple team analyzing the exact same input data. By 
cross-checking each detection with different tools and clean- 
ing techniques (e.g., Cabrera et al. 2009, Carpano et al. 
2009) the CEST hoped to achieve the best possible tran- 
sit candidates list for follow-up observations by the limited 
ground-based telescope resources. Here we present yet an- 
other step in the journey to clean photometric datasets in 
general and CoRoT data in particular: we generalize the 
popular and efficient Sysrem algorithm to include external 
parameters in a simultaneous solution, together with the un- 
known effects. This allows us to show that data from CoRoT 
is probably contaminated with additive, rather than relative, 
systematic effects - and that these effects are the probable 
cause behind the phenomenological observation of MG09. 
The size of the additive correction abs(AjXijC'A,i) is com- 
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